Skip to content

fix(parsers/python): match imports by module components in both directions#116

Open
gadievron wants to merge 1 commit into
masterfrom
fix/python-import-overmatch
Open

fix(parsers/python): match imports by module components in both directions#116
gadievron wants to merge 1 commit into
masterfrom
fix/python-import-overmatch

Conversation

@gadievron

@gadievron gadievron commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Root cause

_resolve_import Strategy 2 (parsers/python/call_graph_builder.py) bound an imported function to a file whose dotted module path string-endswith the imported module (in either direction). That crosses component boundaries: from utils import helper matched extra_utils.py:helper because 'extra_utils'.endswith('utils'), and from auth import login matched extra_auth.py:login. No Python module layout produces those — they are false call-graph edges to the wrong module, and they inflate the reachable set with units that are never really called.

Fix

Match by dotted components, in either direction:

  • the imported module is a component-suffix of the file path — a repo-root prefix on the file (src/pkg/auth.py for from pkg.auth import ...); or
  • the file path is a component-suffix of the imported module — the repo-IS-the-package self-import (auth.py == myapp/auth.py recorded shallow via relative_to(repo_path), for from myapp.auth import ...).

Both are real layouts and are kept. Only the substring-crossing matches are dropped. Relative/bare imports (from . import x, no module) keep the prior name-only behavior.

Reachability / recall safety

An adversarial review caught that an earlier forward-only version dropped the reverse direction (repo-is-package self-imports) — a recall regression. Because the reachability filter keeps only reachable units, a dropped real edge silently removes attack surface. So an edge that could be real — from pkg.auth import login to a top-level auth.py, which is structurally identical to a self-import and indistinguishable from the data — is kept, not forbidden. Net: strictly fewer false edges than master, never fewer real ones.

Regression test

tests/parsers/python/test_call_graph_import_resolution.py — RED→GREEN:

  • 2 precision drops: substring-crossing extra_utils/utils, extra_auth/auth.
  • 6 recall guards: exact package path, repo-root prefix, top-level, repo-is-package self-import, nested reverse-prefix, relative import.

Full Python suite: 12 passed.

Compatibility

None. Resolution only; the change is strictly more precise on false edges and recall-preserving on real ones.

Coordination with open PRs

This edits the _resolve_import Strategy-2 block in parsers/python/call_graph_builder.py, which #72 also reworks (it adds a seen param for __init__.py re-exports) and #112 also touches. Neither fixes this bug — both still carry the bare module_path.endswith(expected_module) or expected_module.endswith(module_path). Expect a merge conflict in that block; happy to rebase onto whichever of #72/#112 lands first.

…tions

`_resolve_import` Strategy 2 bound an imported function to a file whose dotted
module path string-`endswith` the imported module (either direction). That crosses
component boundaries: `from utils import helper` matched `extra_utils.py`
(`'extra_utils'.endswith('utils')`) — a false call-graph edge to the wrong module
that no Python layout can produce.

Fix: match by dotted COMPONENTS, in EITHER direction — the imported module is a
component-suffix of the file path (a repo-root prefix on the file, `src/pkg/auth.py`
for `from pkg.auth import ...`), OR the file path is a component-suffix of the
imported module (the repo-IS-the-package self-import, `auth.py` == `myapp/auth.py`
recorded shallow via relative_to(repo_path), for `from myapp.auth import ...`).
Both are real layouts and are KEPT; only the substring-crossing matches are dropped.

An adversarial review caught that an earlier forward-only version dropped the
reverse direction (repo-is-package self-imports) — a recall regression. Because the
reachability filter keeps only reachable units, a dropped real edge silently removes
attack surface; so an edge that COULD be real (e.g. `from pkg.auth import login` to a
top-level `auth.py`, structurally identical to a self-import) is kept, not forbidden.
Net: strictly fewer false edges than master, never fewer real ones.

Tests (tests/parsers/python/test_call_graph_import_resolution.py): 2 precision drops
(substring-crossing `extra_utils`/`utils`, `extra_auth`/`auth`) + 6 recall guards
(exact, repo-root-prefix, top-level, repo-is-package self-import, nested reverse-prefix,
relative). Full python suite: 12 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gadievron

Copy link
Copy Markdown
Collaborator Author

Merge-order note (not a defect — flagging for landing order)

parsers/python/call_graph_builder.py here overlaps #112 (same Strategy-2 import-resolution block). Resolve as a union, not accept-ours/theirs:

  • keep Python parser: fix 10 finder bugs #112's name check func_data.get('name') == func_name (match the called name, not the module-path tail parts[-1]);
  • keep this PR's component-based module match in both directions (replacing the old cross-boundary endswith).

Whichever lands second will conflict on that one if line; keep both fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant